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Abstract 

We propose an incentive scheme based on intervention to sustain cooperation among self- 
interested users. In the proposed scheme, an intervention device collects imperfect signals about 
^—1 ' the actions of the users for a test period, and then chooses the level of intervention that degrades 

' the performance of the network for the remaining time period. We analyze the problems of 

designing an optimal intervention rule given a test period and choosing an optimal length of 
, the test period. The intervention device can provide the incentive for cooperation by exerting 

I intervention following signals that involve a high likelihood of deviation. Increasing the length 

QQ ' of the test period has two counteracting effects on the performance: It improves the quality of 

■ signals, but at the same time it weakens the incentive for cooperation due to increased delay. 

rh ■ 

. . 1 Introduction 
o . 

This paper studies incentive schemes to sustain cooperation among self-interested users sharing 
• a common network resource. When users utihze the network resource considering their own self- 

' interest, a problem known as the tragedy of the commons [1] is likely to occur, yielding a suboptimal 

' performance. Different methods to overcome the problem have been investigated in the literature. 

I ' One method widely studied in economics and engineering is pricing [2] . Pricing can induce efficient 

(N ■ use of network resources by internalizing negative externalities. Although pricing has a solid theo- 

I retical foundation, implementing a pricing scheme can be impractical or cumbersome in some cases. 

I Let us consider a wireless Internet service as an example. A service provider can limit access to 

• 5^ I its network resources by charging an access fee. However, charging an access fee requires a secure 

^ ' and reliable method to process payments, which creates burden on both sides of users and service 

I providers. There also arises the issue of allocative fairness when a service provider charges for the 

Internet service. In the presence of the income effect, pricing will bias the allocation of network 
resources towards users with high incomes. Because the Internet can play the role of an informa- 
tion equalizer, it has been argued in a public policy debate that access to the Internet should be 
provided as a public good by a public authority rather than as a private good in a market [3] . 

Another method popular in game theory is to use repeated interaction [4J . Repeated interaction 
can encourage cooperative behavior by adjusting future payoffs depending on current behavior. A 
repeated game strategy can form a basis of an incentive scheme in which monitoring and punishment 
burden is decentralized to users (see, for example, [5]). However, implementing a repeated game 
strategy requires repeated interaction among users, which may not be available. For example, users 
interacting in a mobile network change frequently in nature. 
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In this paper, we use an alternative method based on intervention, which was proposed in our 
previous work [6j. In an incentive scheme based on intervention, a network is augmented with 
an intervention device that is able to monitor the actions of users and to take an action that 
affects the payoffs of users. In [6], we considered an ideal scenario where the intervention device 
can observe the actions of users without errors immediately after users choose their actions. In 
this paper, we consider a more realistic scenario where the intervention device can obtain only 
imperfect information about the actions of users and it takes time for the intervention device 
to collect signals. Intervention directly affects the network usage of users, unlike pricing which 
uses an outside instrument to affect the payoffs of users. Thus, an incentive scheme based on 
intervention can provide an effective and robust method to induce cooperation in that users cannot 
avoid intervention as long as they use network resources. Moreover, it does not require long-term 
relationship among users, which makes it applicable to networks with a dynamically changing user 
population. 

2 Model and Problem Formulation 

We consider a communication channel shared by users. Time is divided into slots of equal length, 
and in each slot a user can attempt to transmit its packet or wait. If there is only one transmission 
attempt in a slot, the packet is successfully transmitted. If there is more than one transmission 
attempt in a slot, packets collide and no transmission is successful. For simplicity, we assume that 
each user can choose one of two transmission probabilities pi and ph^ where pi = 1/N < ph < 1- 
Note that each user choosing pi maximizes the total throughput, defined as the average number 
of successfully transmitted packets per time slot, assuming that all the users choose the same 
transmission probability [7|. 

We consider a period consisting of T consecutive time slots, and analyze interaction in the period 
without any consideration of past or future periods. We assume that the number of users and their 
transmission probabilities are fixed throughout a period. Let J\f = {1, . . . , N} be the set of the users. 
The action space of user i is denoted by Ai = {pi,Ph}, and the action of user i is denoted by ai € Ai, 
for all i G M. An action profile is represented by a vector a = (ai, . . . , aj\i) £ A = HiGA^^j- The 
payoff of user i is given by the number of its successfully transmitted packets per time slot. Then the 
expected payoff of user i is given by the probability of its successful transmission, Oj njGA^\{i}(l~'^j)- 
It is easy to see that the action ph is a dominant strategy for every user. Hence, {ph, ■ ■ ■ ,Ph) is 
the unique Nash equilibrium, which yields the lower total throughput than the symmetric social 
optimum {pi,...,pi). 

In order to improve the inefficiency of Nash equilibrium, we introduce an intervention device in 
the system. The intervention device is capable of monitoring the actions of the users and interfering 
in the transmission of the users. The intervention device can sense the channel to learn whether the 
channel is idle (i.e., no user attempts to transmit its packet) or busy (i.e., at least one user attempts 
to transmit its packet). We consider a scenario where the intervention device collects signals from 
sensing the channel for the first t slots, where 1 < t < T, and then chooses its transmission 
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probability, which can be interpreted as the intervention level. 

Let S = {idle, busy} be the set of all possible signals obtained in a slot. Then the set of 
all possible signals that the intervention device can obtain for the t slots is 5*. The probability 

distribution of signals is independent across slots, and when the users choose action profile a, the 



t signals, the intervention device chooses a transmission probability in [0,1], which remains fixed 
until the end of the period. We use subscript for the intervention device. The action space 
of the intervention device is denoted by Aq = [0, 1], and its action is denoted by oq € ^o- The 
decision rule of the intervention device, called the intervention rule, can be represented by a function 
/ : — >■ ^0- Since the transmission probabilities of the users do not change in a period, there is 
no gain for the intervention device to distinguish signals from different slots. Hence, we focus on 
the class of intervention rules that use only the number of idle signals, which can be represented 
by / : {0, 1, . . . ,t} — )■ ^0- The probability that k idle signals arise out of t signals when the users 
choose action profile a is (^)g(a)^(l — g(a))*~'^, for A; = 0, 1, . . . , 1 Note that monitoring is imperfect 
in the sense that the intervention device cannot observe the action profile of the users but obtains 
only imperfect information about the action profile. 

The sequence of events in a period can be listed as follows. 

1. At the beginning of the period, the users choose their transmission probabilities a G ^, which 
are used from slot 1 to slot T, knowing the intervention rule / adopted by the intervention 

device. 

2. The intervention device collects signals from slot 1 to slot t. 

3. The intervention device intervenes using the transmission probability prescribed by the inter- 
vention rule / from slot t + 1 to slot T. 

The payoff of user i when the users choose action profile a and the intervention device chooses 
action ao is given by 



The action profile of the users influences the probability distribution of signals, which in turn affects 
the action of the intervention device. The expected payoff of user i when the users choose action 
profile a and the intervention device uses intervention rule / can be expressed as 



probability of obtaining an idle signal in a slot is given by ^(a) = nigArll ~ ^'i)- After obtaining 




jeM\{i} 



(1) 



(2) 




fc=o 



(3) 



1 - 




(4) 
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Note that Ya:=o ~ can be interpreted as the expected transmission proba- 

bihty of the intervention device, while {T — t)/T is the weight on the slots in which the action of 
the intervention device affects the users. 
For notation, let us define 



X{k;t) 
li{k;t) 



[{i-pif-\i-Ph)ni - {i-pif-\i-Ph)r\ 



(5) 

(6) 



for A; = 0, 1, . . . , t, and let Tc = ~ Pl)^~^ ^^'^ = Ph{^ — Pl)^~^- ^{k; t) is the probability of 
A; idle signals arising out of t signals when every user cooperates (i.e., chooses pi), while iJ,{k;t) is 
that when exactly one user defects (i.e., chooses ph)- Tc is the cooperation throughput that each 
user obtains when all the users choose pi , while is the defection throughput that a user obtains 
when it deviates to ph unilaterally. Note that an idle signal is more likely to occur when every user 
cooperates than when some user defects. Also, note that > Tc, which reflects the positive gain 
from defection when there is no intervention. 

Suppose that there is a network manager who determines the intervention rule used by the 
intervention device. The objective of the manager is to maximize the sum of the payoffs (i.e., total 
throughput) while sustaining cooperation among the users. The cooperation payoff is given by 



l-^^A(fc;i)/(fc) 

k=0 



(7) 



while the defection payoff is 



1 - 



k=0 



(8) 



Hence, the incentive constraint for the users to cooperate can be written as 



k=0 



Tc> 



T-t 



Y,ii{k;t)f{k) 



fc=0 



(9) 



and the problem of designing an intervention rule can be expressed as 



max A?^ 
/ 



l-^Y.>^{k;t)f{k) 



k=0 



subject to 



.^A(fc;i)/(fc) 



k=0 



Tr. > 



-j2^^{k;t)f{k) 



k=0 



0</(A;)<lforallA; = 0,...,t. 



(10) 

(11) 
(12) 
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3 Analysis of the Design Problem 



The design problem (jl0p - (jl2p can be rewritten as a linear programming (LP) problem, 

t 

minY,Kht)f{k) (13) 

fc=0 

T — t V— -\ 

subject to ^[Td^i{k] t) - TcX{k; t)]f{k) >Td-Tc (14) 

fc=o 

< f{k) < 1 for all A: = 0, . . . , t (15) 

The LP problem ()13p - (jl5p is to minimize the expected transmission probability of the intervention 
device while satisfying the incentive constraint and the probability constraints. Exerting interven- 
tion is necessary to punish a deviation, but at the same time intervention incurs efficiency loss under 
imperfect monitoring. Therefore, the manager wants to use the minimum possible intervention level 
while providing the incentive for cooperation. The left-hand side of the incentive constraint (|14p is 
the expected loss from deviation due to the change in the probability distribution of signals induced 
by deviation, while the right-hand side is the gain from deviation. 

Lemma 1. Suppose that an optimal solution to the LP problem (jl3p - (jl5p exists. Then the incentive 
constraint (|14p is satisfied with equality at the optimal solution. 

Proof. Let /* be an optimal solution. Suppose that [{T — t)/T] J2k=ol-'^dlJ'{k; t) — TcX{k;t)]f* (k) > 
Td — Tc- Since > Tc, there exists k' such that Td^{k';t) — TcX{k';t) > and f*{k') > 0. Then 
we can reduce f*{k') while satisfying the incentive constraint and the probability constraint for k' , 
which decreases the objective value since X{k;t) > for all k. This contradicts the optimality of 
/*. □ 

Lemma 1 validates the intuition that the manager wants to use a punishment just enough to 
prevent deviation. The following proposition provides a necessary and sufficient condition for the 
LP problem to have a feasible solution, and the structure of an optimal solution. 

Proposition 1. Let ko = max{k : TdfJ,{k;t) — TcX{k;t) > 0}. Then the LP problem has a feasible 
solution if and only if 

Mk; t) - TcX{k- 1)] >Td- Tc. (16) 

k<.ko 

Moreover, if the LP problem has a feasible solution, then there exists a unique optimal solution f* 
described by 



rik) = { 



1 ifk<k, 



1 



yk-l, 

I 'T- T 'T- I > 

TjH{k-,t)-TcX{k;t) 

ifk>k, 



^{Td-Tc) -Ylk=ob'dK'^;t) -TcX{k;t)] ifk = k, (17) 
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where 



mm < 



k' : -— Mk; t) - T,X{k; t)] >Td-TA. (18) 



k<k' 



Proof. Define the likelihood ratio of signal k (i.e., k idle signals out of t signals) by 

It is easy to see that L{0;t) > 1, L(t;t) < 1, and L{k;t) is monotonically decreasing in k. Note 
that TdfJ-{k; t) — TcX{k; t) > if and only if L{k; t) > pi/ph- Hence, /cq is well-defined, and TdfJ-ik; t) — 
TcX{k;t) > if and only if A; < fco- If (|16p is satisfied, then / defined by f{k) = 1 for all k < ko 
and f{k) = for all > A;o is a feasible solution. To prove the converse, suppose that a feasible 
solution, say /, exists. Then we have 



T 

k<kQ k=0 



and 



fe=0 

and combining the two yields ()16p . 

To prove the second result, suppose that the LP problem has a feasible solution. Then there 
exists a feasible solution, say /, that satisfies the incentive constraint with equality. Define the 
likelihood ratio of / by 



Then the objective value in (fT3|) at / can be expressed as 

T-tTdl{f)-T,- 



(23) 



Hence, the objective value decreases as / has a larger likelihood ratio. To optimize the objective 
value, / should put the probabilities on the signals starting from signal to signal 1, and so on, 
until the incentive constraint is satisfied with equality. Thus, we obtain k, where < A; < k^, that 
is associated with the unique optimal solution. □ 

Since a smaller number of idle signals gives a higher likelihood ratio, an intervention rule yields 
a smaller efficiency loss when intervention is exerted following a smaller number of idle signals. 
Put differently, signal k provides a stronger indication of defection as k is smaller. However, using 
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only signal may not be sufficient to provide the incentive for cooperation, in which case other 
signals need to be used as well. Using signal k with k < kQ contributes to provide the incentive for 
cooperation, although the "quality" of the signal decreases as k increases. Hence, it is optimal for 
the manager to use signals with small k primarily, which yields a threshold k. 

So far we have analyzed the problem of designing an intervention rule when the total number 
of signals, t, is fixed. Now we consider a scenario where the manager can choose t as well as an 
intervention rule. In this scenario, there are two counteracting effects of increasing t. First, note 
that the objective value in ([10]) can be expressed as 



N 



^ Td-Tc 



Tc, (24) 



which shows that increasing t affects the objective value only through /. Since L{k; t) is increasing 
in t, we can achieve a larger likelihood ratio /(/) with larger t. In other words, as the intervention 
device collects more signals, the information becomes more accurate (quality effect). On the other 
hand, increasing t decreases the weight given on the slots in which intervention is applied, which 
makes the incentive constraint harder to satisfy (delay effect). 

Let r*(t) be the optimal value of the design problem (fT0]l -(fT2]). where we set T*{t) = Nph{l — 
Ph)^~^ if there is no feasible solution with t. The problem of finding an optimal number of signals 
can be written as max^gjx r*(t). In general, T*{t) is a non-monotonic function of t, and we 
provide a numerical example to illustrate the result. We consider system parameters = 5, 
pi = = 0.2, ph = 0.8, and T = 100. Then we have Tc = 0.08 and = 0.33. The numerical 
results show that the LP problem is infeasible for t = 1 and t > 21. With t = 1, there is not 
sufficient information based on which intervention can provide the incentive for cooperation. With 
t > 21, the delay effect is dominant, which prevents the incentive constraint to be satisfied. Figure[I] 
plots T*[t) for t = 2, . . . , 20. We can see that T*(t) is non-monotonic while reaching the maximum 
at t = 18 with T*(18) = 0.37. In the plot, the dotted line represents the total throughput at 
{pi, . . . ,pi), Ntc- The difference between r*(t) and Ntc can be interpreted as the efficiency loss 
due to imperfect monitoring^ Lastly, we note that k in Proposition 1 is non-decreasing in t, with 
A: = 1 for t = 2, . . . , 7, A; = 2 for t = 8, . . . , 13, Ai = 3 for t = 14, . . . , 18, and Ai = 4 for t = 19, 20. 



4 Conclusion 

We have studied the problem of designing incentive schemes based on the idea of intervention to 
sustain cooperation among users sharing network resources in the case of imperfect monitoring. We 
have used a simple model to present the main ideas and results without too many complications. 
Our model can be extended in several directions, among which we mention two. First, users can 
use more complicated decision rules than the one choosing one of two transmission probabilities. 
The action space for a user can be relaxed to [0,1] instead of {pi,Ph}- Also, users can have an 

^If the intervention device can observe the actions of the users immediately, it can use the threat of transmitting 
with probability 1 when a deviation is detected to sustain cooperation without incurring an efficiency loss. 
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Figure 1: Plot of r*(t) for t = 2, . . . , 20. 

ability to monitor the actions of other users and the intervention device. In such a scenario, we 
can study intervention rules to sustain a cooperative decision rule, where a decision rule for a 
user is a mapping from its information set to its action space. Second, the set of signals that the 
intervention device can obtain in a slot can be generalized. For example, a signal can be ternary 
(idle, success, collision) or the number of users that attempted to transmit. We can investigate how 
optimal intervention rules and their performance change as the intervention device obtains finer 
information about the actions of users. Finally, we conclude with a remark that incentive schemes 
based on intervention can be applied to a wide range of networks where cooperative behavior should 
be encouraged. Potential applications include communication networks (power control, congestion 
control, and medium access control) and peer-to-peer networks. 
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